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ABSTRACT 

We present a catalog of visual like H-band morphologies of ~ 50.000 galaxies < 24.5) in the 5 

CANDELS fields (GOODS-N, GOODS-S, UDS, EGS and COSMOS). Morphologies are estimated with Con¬ 
volutional Neural Networks (ConvNets). The median redshift of the sample is < z 1.25. The algorithm 
is trained on GOODS-S for which visual classifications are publicly available and then applied to the other 4 
fields. Following the CANDELS main morphology classification scheme, our model retrieves the probabilities 
for each galaxy of having a spheroid, a disk, presenting an irregularity, being compact or point source and 
being unclassifiable. ConvNets are able to predict the fractions of votes given a galaxy image with zero bias 
and ~ 10% scatter. The fraction of miss-classifications is less than 1%. Our classification scheme represents 
a major improvement with respect to CAS (Concentration-Asymmetry-Smoothness)-based methods, which hit 
a 20 — 30% contamination limit at high z. The catalog is released with the present paper via the Rainbow 
database (http : //rainbowx . f is . ucm. es/Rainbow_navigator_public/). 

Subject headings: galaxies etc.. 


1. INTRODUCTION 

Since the pioneer works in the first half of the XXth century 
by E. Hubble, galaxie s have been classifie d according to their 
visual aspect (see e.g. |Hubble|1926 1936| ). This very first op¬ 


tical classification revealed thk galaxies m the local Universe 
are broadly bimodal, with or without a stellar disk {Hubble 
Fork). Understanding the physical processes that lead to such 
a bimodality - i.e. how bulges and disks form and evolve - 
is one of the major challenges in the field of galaxy evolu¬ 
tion and the main goal of deep field surveys. Classification 
of galaxies at different cosmic epochs is therefore a key step 
towards understanding how the progenitors of today’s Hubble 
Fork were shaped. The main difficulty is that is hampered by 
the impressive amount of data which are and will be available 
from large galaxy surveys. 

A question naturally arises: can human classifiers be 
replaced by automatic techniques? There have been some ef¬ 
forts led by different groups towards that direction consisting 
on using existing visual morphologies on a smaller dataset 
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he basic idea behind these approaches is to find a 


set of parameters that correlate with the visual morphology 
of a galaxy and define the space of para meters that best 
characterize a given morphological type, (e.gp^braham et al.| 


19961 [Conselice et al.|20()0} |Lotz et al.|2008 ). In astronomy, 

the parameters defining morphology traditionally include 
concentrations, asymmetries, dumpiness (or smoothness), 
gini coefficient, moments of light etc. 


In the last years, we proposed a generalization of this ap¬ 


proach with the development of galSVM (|Huertas-Company| 
|et al.|2Q08l|2009[|2Qll| ), which enables an n-dimension clas- 
sification with optimal non-linear boundaries in the parameter 
space as well as a quantific ation of errors following a prob¬ 
abilis tic approach (see also |Peth et ar]|2Q15t [Scarlata et al. 
2007| ). These CAS (Concentration-Asymmetry-Smoothness)- 
based methods have been proven to be relatively useful but 
are also affected by several limitations. The values of the 
parameters strongly depend on the data quality and redshift 
and they only provide rough morphological classifications 
in 2 or 3 classes. The most evident shortcoming with such 
techniques is that the fraction of miss-cla ssifications is high 
especially a t high redshifts (^ 20 — 30%, |Huertas-Company| 
|et al.|201^ . The latter is possibly the main reason why their 
popularity among the astronomical com munity is still quite 
low (see review by |Ball & Brunner|2010 ). 

The problem might reside in the parameters that people 
traditionally adopt. Concentrations, asymmetries etc and 
by extension principal components are useful because they 
reduce the complexity of the problem by globally describing 
a galaxy with just a few parameters. However, this approach 
at the same time, neglects an enormous amount of informa¬ 
tion contained in the pixels themselves. As a consequence 
CAS-based methods might not be suited to actually represent 
the capability of the human brain to capture the full, complex 
distribution of light. Using all the pixels as parameter space 
is now possible with the advent of powerful computing 
resources such as Graphic Processor Units (GPUs). At 
the same time, there exist very powerful machine learning 
algorithms suited to mimic the human perception (such 
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as deep learning) which are able to learn the best set of 
parameters for a given problem.This new approach has been 
first used in astronomy at low redshift earlier this year, in the 
framework of an online competition led by the Galaxy Zoo 
team (s ee § [3| for more deta ils) yielding to very promising 
results ( [Dielernan et al.|20T5] hereafter D15). 

In this paper, we extend this new methodology to high 
redshift by classifying ^ 50.000 galaxies with median 
redshift < z 1.25 in the CANDELS fields where detailed 
visual cl assifications are availabl e for a subsample of ^ 8.000 
objects ( [Kartaltepe et al.|[2014| ). We show that the use of 
deep learning yields to an almost free-of-contaminations 
classification that closely mimics the human perception. The 
resulting catalog on the 5 CANDELS fields (GOODS-S, 
GOODS-N, UDS, EGS and COSMOS) is released with the 
present work. 

The paper is structured as follows. In sectionwe describe 
the dataset. In section [3] we describe the methoo^and how the 
CANDELS data are pre-processed before feeding the algo¬ 
rithm. In sections and we discuss the performance and 
accuracy of the resulting classification and in section ^ we 
describe the properties of the catalog which is releases We 
conclude with a summary of the main results (section [7 ). 

2. DATASET 

Our starting-point catalogs are the CANDELS p ublic 
photometric catalogs for UDS (|Galametz et al.||2013|) and 
GOODS-S ( |Guo et aL]|2Q13| ). Preliminary CANDELS cata¬ 
logs were used for COSMOS, EGS and GOODS-N (private 
communication). We select all galaxies in the E160W filters 
with E160W<24.5 mag (AB system) which is the magnitude 
limit imposed by Karltatepe et al. (2014) to perform reliable 
visual morphological classifications. Since our goal is to 
provide a morphological classification as close as possible to 
the visual one, we restrict our selection to the same criteria in 
all considered fields. 


The resulting sample consists of 50.000 galaxies, which in¬ 
creases by a factor of 5 the visual catalog published in CAN- 
DELS up to date. About 50% of the sources are between 
1 < z < 3 (fig. [B, where the CANDELS filters probe optical 
rest-frame mor^ol ogies. As extensively discussed in |Kar-| 
taltepe etaL] ( |2014| ), the sample is ^ 80% complete down to 
log(M^/MQ) ^10 (see their figure 1). 


3. CANDELS MORPHOLOGICAL CLASSIEICATION 
WITH DEEP LEARNING 

3.1. Convolutional Neural Network (ConvNet) configuration 

In this work we mimic the human perception with deep 
learning using convolutional neural networks (ConvNets). 
Although it is clearly beyond the scope of the present paper to 
give a complete description of how convolutional neural net¬ 
works work, we provide a brief introduction below. We refer 
the interested reader to D15 for more details. 

Deep learning is a methodology to automatically learn and 
extract the most relevant features (or parameters) from raw 
data for a given classification problem through a set of non¬ 
linear transformations. 

Though de ep learning archite ctures have existed since the 
early 80s ( [Eukushima 1980| ), they involve complex tech¬ 
nological problems that only allowed their use in massive 


datasets in the last decade. Several factors have contributed 
to the rise in their popularity: (i) the availability of much 
larger training sets, with millions of labeled examples R (ii) 
powerful GPU implementations, making the training ofvery 
large models practical; (iii) improved model regularization al¬ 
gorithms, which helped reducing the computing time. 

ConvNets have been proven to perform extremely well in 
image recognition tasks. Eor example, they have achieved an 
error rate of 0.23% on the MNIST database, which is a col¬ 
lection of manuscript numbers consider ed as a standard tes t 
for all new machine learning algorithms (|Ciresan et al.|2012|). 
When applied to facial recognition, they achieve a 97.6% 
recognition rate on 5,6 00 images of more than 10 subjects 
( Matusugu et al.|[2^013| ). The ImageNet Large Scale Visual 
Recognition Challenge is a benchmark in object classification 
and detecti on, with millions of image s and hundreds of object 
classes. In |Krizhevsky et aL| ( |2012| ), ConvNets were able to 
get an error rate of 15.3% cornpared to 26.2% achieved by the 
second best competitors (non deep). Also, the performance of 
convolutional neural networks on the Image Net tests is now 
close to a purely human based classification (|Russakovsky et| 
|al.|2014| ). 


ConvNets were first applied to galaxy morphological clas¬ 
sification earlier this year in the framework of the Galaxy Zoo 
Challenge in the Kaggle platform R . The aim of the chal¬ 
lenge was to find an algorithm able to predict the 37 votes 
of the Galaxy Zoo 2 release. The winner of the competition 
used ConvNets to get a final RMS of ^ 7% on the parameters 
( Dielernan et al.||2015j) . This work clearly showed that Con¬ 
vNets are a very promising tool for automated morphological 
classifications. 

There is no clear methodology to find the optimal convo¬ 
lutional neural network for a given problem except for trying 
different configurations and comparing the outputs. The one 
used for the Galaxy Zoo challenge provided excellent results 
for a similar problem to ours (fig.|3. We therefore decided to 
use the D15 configuration to clas^Ty the CANDELS sample. 
Given the different nature of SDSS and CANDELS images, 
our methodology, by design, reqmres specific pre-processing 
steps, as discussed in section |3.3| This is certainly not the 
cleanest approach but it is sufficient for our classification pur¬ 
poses as discussed in subsequent sections. 


3.2. Training set 

The ConvNet is trained to reproduce th e CANDELS vi¬ 
sual m orphological classification defined in [Kartaltepe et al. 

( |2014j ). This classification is based on the efforts of 65 in- 
dividual classifiers who contributed to the visual inspection 
of all galaxies in the GOODS-S field (being 3-5 the aver¬ 
age number of classifiers per galaxy). The classifiers were 
asked to provide a number of flags related to the galaxy’ 
structure, morphological k-correction, interaction status and 
dumpiness. As a result, each galaxy in the catalog has a num¬ 
ber of flags, which measure the fraction of classifiers who se¬ 
lected a morphological feature. The classification was mainly 
performed in the H band (E160W), even though each clas¬ 
sifier had access to the images of the same galaxy in other 
wavelengths. 

In this work, we will focus on the main classification tree 
which defines the main morphological class (fig. |^. Eor 

^ ConvNets are particularly sensitive to this since the risk of over-fitting 
is large given the complexity of the models 

^ https://www.kaggle.com/c/galaxy-zoo-the-galaxy-challenge 
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Fig. 1.— Redshift (left) and stellar mass (right) distributions of the selected sample for morphological classifications. The dataset contains more than 20.000 
galaxies at z > 1 where the CANDELS fields probe the optical rest-frame morphologies. 


ConvNet 



fps 


func 

Fig. 2.— Configuration of the Convolutional Neural Network used in this paper. The Network is based on the one used by|Dieleman et al.U2015) on SDSS 
galaxies. It is made of 5 convolutional layers followed by 2 fully connected perceptron layers. In the convolutional part there are also 3 max-pooling steps of 
different sizes. The input are SDDSized CANDELS galaxies as explained in the text and the output (for this paper) is made of 5 real values corresponding to the 
fractions defined in the CANDELS classification scheme. 
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each galaxy there are therefore 5 parameters, fspheroid, fdisk. 
firr. fps and func which refer respectively to the frequency at 
which human classifiers flagged a given galaxy as having a 
spheroid, a disk, some irregularities, being a point-source (or 
unresolved) and unclassifiable. It is important to notice that 
one flag does not exclude the other (except for the Unc one) 
i.e. a galaxy can obviously have both a disk and a spheroid or 
have a disk and be irregular, so the sum of all frequencies for 
a given object is not one. 

The main purpose of this work is to mimic the human be¬ 
havior. In other words, we want the machine to be able to 
predict how many people will vote for a given feature given 
the galaxy image. Recall that the objective we consider here 
is to replace humans by computers, no to find the correct mor¬ 
phology of a galaxy, which actually depends on the definition 
one wants to adopt. Hence if the visual classification is intrin¬ 
sically biased, so will be the machine based one. 

The classification in GOODS-S contains ^ 8000 galaxies 
for which we know the visual classification done by (expert) 
humans so we can use part of this sample to train the machine 
learning algorithm and keep a fraction for an independent test. 
Recall also that during the preparation of the present work, the 
UDS field has also been finalized so it also represents an in¬ 
dependent test for the classification as discussed in sectionl^ 
In the following, we describe the pre-processing done to the 
images before feeding the ConvNet. 

3.3. Pre-processing 

As previously discussed, we will use for this work, the Con¬ 
vNet design shown in D15 optimized for the SDSS. There 
are some obvious problems related to this apwoach, since 
galaxies at high redshift are intrinsically smalleij^and fainter. 
Also the training set is made of only ^ 8000 galaxies from 
GOODS-S with visual parameters, compared to the 60 x 10^ 
galaxies used for the SDSS training. This last point is partic¬ 
ularly critical since training the ConvNet with a significantly 
smaller sample can easily lead to over-fitting issues, i.e. too 
many parameters in the model we want to build compared 
with the number of data points 

To overcome the latter potential issues, we pre-processed the 
training set before feeding it to the ConvNet applying the fol¬ 
lowing steps (see fig.|^: 

• All galaxies in the GOODS-S visual morphology cat¬ 
alog are interpolated to the typical SDSS size (i.e. ^ 
40 pixels). This is performed using a classical cu¬ 
bic interpolation. The procedure obviously introduces 
some redundancy in the data since we artificially re¬ 
duce the pixel size, but ensures that the network sees 
the same ratio of background vs. galaxy pixels than 
for the SDSS. It is important since the size of the con¬ 
volution box is fixed. An alternative approach would 
have been to adapt the network size to the typical size 
of CANDELS images. In any case, some interpolation 
is required given the wide redshift range probed by the 
CANDELS data (z ^ 0.1 to z ^ 3) which means that the 
length scale changes by more than a factor of 4. There¬ 
fore, even if the interpolation factor could be decreased, 
it is required at some level. In this work, since we are 

^ typically 5-10 pixels - ~ 0.3- compared to 40 pixels lO”- for the 
SDSS galaxies 


interested in broad morphologies, the impact of inter¬ 
polation is not a major issue and therefore we decided 
to keep the original network. 

• Each galaxy is randomly rotated 3 times before feeding 
it to the net. Since our dataset is significantly smaller 
than the one used in the GZOO competition, there is 
a clear risk of over-fitting in the classification process. 
We therefore introduce additional redundancy in the 
training set to increase the number of training points 
taking advantage of the fact that morph ological classifi¬ 
cation s should be rotationally invariant ( [Dieleman et al. 
|2015| ). As explained in D15, the algorithm itself will in¬ 
troduce additional redundancy by performing two more 
90^ rotations. 


• We then introduce some random Gaussian noise to each 
of the rotated images so that the pixel values of each 
realization are not exactly the same. The added noise 
is small enough not to affect the visual aspect of the 
galaxy but it slightly changes the pixel values. This en¬ 
sures that the redundancy is actually efficient and that 
the network considers each rotated galaxy as a differ¬ 
ent object with very similar morphological parameters 
just as the human eye does. Einally, each of the ro¬ 
tated images is converted to JPEG with a power -law 
stretching optimized for astronomy]^ ( Bertin[2012| ) and 
a 10% compression. This is important to keep the num- 
ber of possible pixel values reasonable and also to have 
a similar normalization for all galaxies. We stress again 
that since we are here interested in broad morpholo¬ 
gies (disk vs. bulge, irregular, compact) the impact of 
compression is not critical, as shown in subsequent sec¬ 
tions. Eor more detailed morphologies (e.g. LSB fea¬ 
tures, bars etc), especially at high redshift, a careful in¬ 
vestigation of the optimal compression will certainly be 
required. 


• The previous steps were repeated in three CANDELS 
filters (fl05, fl25 and fl60) to reach a final train¬ 
ing set of ^ 58.000 galaxies (8000 x 3 {rot at ions) x 
^{filters)), very close to the 60.000 SDSS object for 
which the net was designed. Note that the spatial cover¬ 
age of all filters is not exactly the same which explains 
why we only reach ^ 60.000 galaxies. The size of the 
dataset is enough to avoid over-fitting and reach satis¬ 
factory results as shown in the next sections. The use 
of the same galaxies in three different filters might in¬ 
troduce some biases since the morphology might look 
slight ly different from one fil ter to the other. How¬ 
ever, IKartaltepe et al.J ( |2014| ) show that the fraction 
of galmesTEaTactually cEange their morphology be¬ 
tween these 3 filters is very small. In any case, we also 
tried the algorithm using only fl60 images (reducing 
the training set by a factor of 3) leading to no signifi¬ 
cant changes in the final results (~ 0.01 change in the 
final RMSE value). 


• We finally introduce some noise in the visual param¬ 
eters of each galaxy {/spheroid, /disk. firr. fpS nud fjjnc) 
by adding a random gaussian 10% scatter. This is 
done, firstly to make sure that the ConvNet does not 


http://www.astromatic.net/software/stiff 
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Fig. 3.— CANDELS Main Morphology visual classification scheme as described in|Kartaltepe et aLU2014). Each classifier (3-5 per galaxy on average) is 
asked to provide for each galaxy 5 flags corresponding to the main morphological propAties ot the galaxy as labeled in the figure. The flags are then combined 
to produce fractions of people that voted for a given feature. 


see exactly the same data points for different redun¬ 
dant images and force optimization. Second, because 
the CANDELS fractions are very discretized since the 
actual number of classifiers per galaxy is rather small 
and therefore the full range of values from 0 to 1 is not 
covered. The 10% value is calibrated empirically and 
it is of the order of magnitude of the intrinsic noise of 
the labels (assuming that they follow a binomial distri¬ 
bution - see section [^. Below this value the effect is 
almost negligible and above, the original signal is di¬ 
luted. As we will show in section |5] this has also some 
important consequences on the final output. 


The final dataset used for classification contains thus 
^ 58.000 redundant JPEG images of which 47.700 are 
used for training the machine (i.e. finding the best model), 
5.300 are used for real-time evaluation during model training 
(validation dataset) and 5.000 galaxies are used to assess the 
final accuracy with the best final model (test dataset). These 
5.000 galaxies constitute the test sample and are not used at 
all during the training process (but their visual morphology 
is known) so they can be independently used to study the 
behavior of the best trained model on an unknown dataset. 
The final model is taken at 2500 chunks. As described in 


Dieleman et al. ( 2015| ), to further improve the classification 
accuracy, averaging of 17 variants of the best model is applied 
as post-processing. These variants include modifications such 
as removal of dense layers, different filter size configurations, 
and different number o f filters among others. We refer to 
[Dieleman et al.| ( |2015| ) for more details. The best model 
followed by the averaging process is then used to classify the 
other 4 CANDELS fields in which the visual morphology 
is not yet available. The classification is done at a rate of 
^ 1000 galaxies/hour on a TESLA M2090 GPU, which is 
compatible with the treatment of massive datasets expected 
in the near future (e.g EUCLID, WEIRST). 


The evolution of the root mean square error (RMSE) dur¬ 
ing the final learning process for the training and validation 
datasets is shown in figure The difference in RMSE on the 
validations dataset in the last 10 iterations is of the order of 
10“^, confirming that the algorithm has converged. There is 
no significant over-fitting given the convergence of the vali¬ 
dation set’s RMSE. As expected, the RMSE for the training 
set is slightly smaller (^ 0.01), as this is the data directly used 
to fit our ConvNet model (recall that the validation data-set is 
used for real time evaluation of the model on unseen data). We 
also show in figurej^the values of the RMSE for the test sam¬ 
ple before and after averaging. As explained above, this third 
data-set is needed to assess the final RMSE of the model, as it 
may happen that the 2500 chunks we use for convergence are 


over-fitted to the validation data-set. The RMSE over the test 
set is very consistent with the one obtained on the validation 
dataset. Averaging, slightly reduces the RMSE by ^ 10~^ , 
consistent with the values reported in [Dieleman et al.| ( [2015| ). 

We made sure that the different pre-processing steps 
described above result always in a decrease of the average 
root mean square error (RMSE) on the validation and test 
samples. More precisely, before any pre-processing, the av¬ 
erage RMSE is ^ 0.25. Adding noise to the labels decreases 
the error to ^ 0.22. Interpolation makes it reach ^0.17 and 
finally redundancy together with noise addition bring it to the 
final value of ^ 0.13 (figurej^. 


4. ACCURACY 
4.1. Recovering votes 

Eigure shows the relat ion between the visual fractions 
for each galaxy provided in [Kartaltepe et al. [ ( [20 1 4[ ) once the 
random shifts have been applied and the predicted values, for 
the main classification tree {f spheroid, fdisk, firr, fps and func)- 
We only plot in figure objects in the test sample (5000 
objects) which were not used for training in order to assess 
the behavior of the machine with an unknown dataset. Results 
in terms of bias and scatter are also tabulated in table [T] There 
is a clear one-to-one correlation between the automatically 
derived quantities and the visual ones. Tableshows that the 
typical bias and dispersion are lower than 10%. It is important 
to keep in mind that the distribution of frequencies is not 
homogenous between 0 and 1 (there are bins in which there 
are very few objects) and the machine is therefore optimized 
to minimize the global bias. In fact, the median bias and 
scatter for all morphological frequencies are even smaller and 
range between 0 — 0.02 and 0.03 — 0.1 respectively as shown 
in table If we plot instead galaxies in the training set, the 
scatter is almost the same, as expected from the learning 
histories shown in figure This confirms that the model is 
well-optimized and that there is no over-fitting (fig. [ 7 ]). 


Despite of the scatter, it is important to notice that the tails 
in the distribution seen in fig. do not necessarily imply 
miss-classifications as we currently define them, i.e. galaxies 
which clearly fall in the wrong morphological class after vi¬ 
sual inspection. As a matter of fact, a galaxy that might have 
a slightly larger bulge probability in the automated scheme 
than in the purely visual classification, will be however clearly 
classified as a disk since its probability is much higher. Eig¬ 
ure [8] shows the relation between the maximum visual fre¬ 
quency, defined as the maximum frequency irrespective of the 
morphology for each galaxy, and the maximum automatic fre¬ 
quency. Both quantities are correlated with the expected scat- 
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Conversion to JPEG 
with random color 
perturbation 



X 3 filters 
(f160,f125,f105) 

<•'60.000 galaxies 


Fig. 4.— Pre-processing of the CANDELS stamps before being fed to the convolutional neural network. Galaxies are first interpolated so that they all have 
similar sizes. In a second step, we add some redundancy to the data by performing random rotations in order to avoid over-fitting and finally converted the images 
to JPEG. This is repeated for 3 CANDELS filters. See text for details. 


ter with no tails even though there seems to be an increasing 
bias at low frequencies (fmax < 0-5). This is not surprising 
since those are the most unclear objects of the visual catalog. 

We also explore in figure 0 how the performance of the 
classification depends on physical properties such as redshift, 
magnitude and size relative to the PSF FWHM. Interestingly, 
we do not observe any particular trend on the bias or the scat¬ 
ter with magnitude and redshift. The bias in the morpholog¬ 
ical fractions stays <0.05, and the scatter is rather constant 
at 0.1 for all magnitudes and redshifts spanned by our sam¬ 
ple. Only very small objects, close to the size of the PSF 
or very large (> 4 times the PSF size), have a larger bias 
0.05 — 0.1). For large objects, this could be explained by 
the fact that part of the wings might be lost during the inter¬ 
polation process at fixed size. Recall that this does not nec¬ 
essarily mean that the morphology can be assessed equally 
independently of brightness, redshift or size, but that the al¬ 
gorithm is able to reproduce the visual classification (with its 
eventual biases) with the same accuracy. 

4.2. Recovering dominant classes and miss-classifications 

An important measurement in any automated classification 
scheme is the fraction of objects which are miss-classified, 
i.e. objects that will fall in a different morphological class 
in the automated classification compared to the visual one. 


Since both classifications are continuous in the sense that 
each galaxy has 5 real numbers associated to it, the answer to 
this question will strongly depend on the boxes one considers 
and on how these boxes are defined. 

In order to provide an estimate of this miss-classification 
rate that can be compared to previous classification methods, 
we select objects which do have a clear dominant class (DC) 
in the automatic and visual classifications. We define a galaxy 
with a dominant class if at least one frequency is considerably 
larger than the other 4. We then compare how both dominant 
classes match. 

We adopt here a conservative offset value of 0.5 between the 
highest frequency and the second highest i.e. if f^ax >0.75 
then the second largest probability has to be smaller than 
0.25, as a criterion to identify galaxies with a clear dominant 
morphology. There are therefore 5 dominant classes, i.e. 
dominant spheroid, dominant disk, dominant irregular, 
dominant point source and dominant unclear. The results of 
such a comparison are shown in figure |T^ The degree of 
agreement in the identification of the mam morphology of a 
galaxy is ^ 97 — 100%. 

In a more general way, we can also investigate how the 
global classification accuracy depends on the level of agree- 
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Fig. 5.— Time trajectories for the training (dotted blue line) and validation (red solide line) sets (see text for details). The RMSE si computed every 60 chunks. 
The blue/red stars indicate the values computed with the final model (2500 chunks) on the training and test samples respectively after averaging and reported in 
Tablejg The empty star shows the RMSE on the test sample before averaging. 


ment between the classifiers. As shown in IDieleman et al.1 
( [2015| ) for the SDSS classification, objects for which a high 
number of people provided the same classification are better 
recovered than the ones that present a uniform distribution in 
their frequencies. This is simply reflecting the fact that galax¬ 
ies that are not easily classified by humans are also hardly 
recovered by the classification model. Following the same 
approach as D15, we define the level of agreement a between 
classifiers for a 5 class problem: 

a=\-H{f)/log{5) 
where H{f) is the entropy defined as: 

^(y) — fspheroid^ Ogi^fspheroid) 

fdisk^^Sifdisk) 

- firrlog{firr) (1) 

— fpslog{fps) 

-fu nc logifu nc) 

The agreement parameter a ranges between 0 and 1, with 
large values indicating high level of agreement (most of the 
classifiers selected the same class) and low values associated 
to objects with low levels of agreement (the votes are dis¬ 
tributed uniformly between the different classes). 

Figure \TT \ reports the mean classification accuracy defined as 
the matcn between the automatic dominant class and the vi¬ 
sual dominant class, as a function of a. The agreement param¬ 
eter a is computed using the automatic and visual classifica¬ 
tions. As expected, the accuracy increases when the level of 
agreement increases. Well defined objects reach an accuracy 
> 90% but it drops to ^ 50% for galaxies with a < 0.2. This 


behavior is very similar to the one reported in figure 9 of D15, 
which confirms the similar behavior of the classifier at high 
redshift. 

The results above clearly represent a major step forward 
compared to other CAS-based methods. Firstly, CAS meth¬ 
ods are not able to clearly distinguish between unclassifiable 
objects and galaxies since the morphological parameters 
for unclassifiable objects can have any unpredictable value. 
ConvNets identify them without ambiguity. 

A similar issue affects point/compact sources which will usu¬ 
ally fall in the early-type galaxy (ETC) class in CAS methods, 
unless a previous cleaning is performed. The most important 
thing is however that, even for the distinction of dominant 
spheroids from dominant disks, advanced CAS-based meth¬ 
ods such as galSVM do show a tail of dominant disks with 
high ETC probability and vice-ve rsa (fig. M. yielding to 
a ^ 2 0% miss-classification rate (Huertas-Company et al. 
|2Q14| ). The situation is more dramatic for the distinction 
of dominant irregulars from dominant disks. It is almost 
impossible with CAS-based approaches, given that at high 
redshift many of the disks pre sent high asymmetric values 
( [Huertas-Company et al.|2Q14] ). This is clearly shown in the 
right panel of figure |12| wh^ dominant disks have a very 
wide irregular probability distribution. ConvNets here do pro¬ 
vide a huge improvement by perfectly separating both classes. 


Figure shows some example stamps of these 5 DCs se¬ 
lected in me COSMOS field where no visual morphologies 
are available. Objects are fully randomly selected. Clearly 
the visual aspect of all objects matches the dominant class 
in which they fall in the ConvNet classification, confirm¬ 
ing the low miss-classification rate estimated in figure [TOj for 
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Fig. 6.— Correlation between the fractions of classifiers voting for a given feature (spheroid (top left), disk (top right), irregular (middle left), point source 
(middle right) and unclassifiable ^ottom left)) and the predictions of the ConvNet based classification on a test dataset. Detailed quantifications of the bias and 
the dispersion are shown in table[2 
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Test Sample 


0 fsph ^ 0-2 

0.2 < fsph 0.4 

0.4 < fsph 0.6 

0.6 < fsph < 0.8 

0.8 < fsph <1.0 

Bias 

0.03 

-0.01 

0.00 

-0.05 

-0.10 

RMSE 

0.09 

0.15 

0.15 

0.17 

0.16 

Scatter 

0.07 

0.14 

0.14 

0.12 

0.09 


0 < fdisk < 0.2 

0.2 < fdisk < 0.4 

0.4 < fdisk < 0.6 

0.6 < frlisk < 0.8 

Q).^< fdisk 

Bias 

-0.00 

0.11 

0.06 

0.06 

-0.00 

RMSE 

0.09 

0.17 

0.16 

0.13 

0.09 

Scatter 

0.05 

0.17 

0.15 

0.10 

0.05 


0 < firr < 0.2 

0.2 < firr < 0.4 

0.4 < firr < 0.6 

0.6 < firr < 0.8 

0.8 < firr <1.0 

Bias 

0.01 

-0.06 

-0.10 

-0.12 

-0.14 

RMSE 

0.06 

0.13 

0.16 

0.20 

0.23 

Scatter 

0.05 

0.13 

0.15 

0.12 

0.12 


0 < /ps < 0.2 

0.2 < fps < 0.4 

0.4 < fps < 0.6 

0.6 < fps < 0.8 

0.8 < fps <1.0 

Bias 

-0.01 

-0.11 

-0.10 

-0.04 

-0.09 

RMSE 

0.04 

0.14 

0.21 

0.19 

0.16 

Scatter 

0.04 

0.15 

0.21 

0.15 

0.08 


0 < func < 0.2 

0.2 < func < 0.4 

0.4 < func < 0.6 

0.6 < func < 0.8 

0.8 < func <1.0 

Bias 

-0.02 

-0.17 

-0.07 

0.19 

-0.03 

RMSE 

0.03 

0.16 

0.12 

0.23 

0.09 

Scatter 

0.03 

0.21 

0.07 

0.22 

0.02 

Training sample 


0 < fsph < 0.2 

0.2 < f,ph < 0.4 

0.4 < fsph < 0.6 

0.6 < fsph < 0.8 

0.^< fsph <1.0 

Bias 

0.03 

-0.02 

-0.02 

-0.01 

-0.07 

RMSE 

0.08 

0.13 

0.15 

0.13 

0.12 

Scatter 

0.06 

0.13 

0.13 

0.10 

0.07 


0 < fdisk < 0.2 

0.2 < fdisk < 0.4 

0.4 < fdisk < 0.6 

0.6 < fdisk < 0.8 

0.^< fdisk <1.0 

Bias 

0.01 

0.07 

0.08 

0.05 

-0.00 

RMSE 

0.09 

0.15 

0.14 

0.12 

0.08 

Scatter 

0.06 

0.13 

0.12 

0.09 

0.05 


0 < firr < 0.2 

0.2 < firr < 0.4 

0.4 < firr < 0.6 

0.6 < firr < 0.8 

0.8 </,>,< 1.0 

Bias 

0.00 

-0.06 

-0.08 

-0.08 

-0.11 

RMSE 

0.05 

0.12 

0.15 

0.16 

0.18 

Scatter 

0.05 

0.12 

0.13 

0.12 

0.10 


^ < fps < 0.2 

0.2 < fps < 0.4 

0.4 < fps < 0.6 

0.6 < fps < 0.8 

0.8 < fps <1.0 

Bias 

-0.01 

-0.11 

-0.16 

-0.07 

0.01 

RMSE 

0.04 

0.13 

0.18 

0.19 

0.13 

Scatter 

0.03 

0.15 

0.18 

0.14 

0.08 


0 < func < 0.2 

0.2 < func < 0.4 

0.4 < func < 0.6 

0.6 < func < 0.8 

0.8 < func <1.0 

Bias 

-0.02 

-0.10 

-0.11 

-0.01 

0.03 

RMSE 

0.03 

0.14 

0.19 

0.22 

0.22 

Scatter 

0.03 

0.14 

0.17 

0.15 

0.09 


TABLE 1 

Median bias (A/ = {fauw — fvisu)), root mean square error (RMSE) and scatter as a eunction oe the visual morphological 

EREQUENCIES EOR THE TEST (TOP) AND THE TRAINING (BOTTOM) SETS. 



[outo] fgph [outo] 
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Fig. 7.— Same as figure|^but for objects used for the training. 
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Test Sample 

Parameter 

Bias 

Scatter 

RMSE 

/spheroid 

0.03 

0.09 

0.17 

/disk 

0.03 

0.08 

0.15 

firr 

-0.01 

0.07 

0.14 

fps 

-0.01 

0.04 

0.10 

func 

-0.02 

0.03 

0.07 

ALL 

0.00 

0.05 

0.13 

Training sample 

Parameter 

Bias 

Scatter 

RMSE 

/spheroid 

0.02 

0.08 

0.15 

/disk 

0.02 

0.08 

0.14 

/irr 

-0.01 

0.06 

0.12 

/ps 

-0.01 

0.04 

0.09 

/Unc 

-0.02 

0.03 

0.05 

ALL 

-0.01 

0.05 

0.12 


TABLE 2 

Median bias (A/ = {fauw - fvisu)) and scatter eor each visual morphological erequency eor the test and training samples. 



Fig. 8 


Relation between the maximum fraction in the visual and the automatic classifications. 
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Fig. 9.— Mean Bias (A/ = fauto — fvisu) and scatter (^VAR{Af)) of the three main morphological fractions (spheroid, disk and irregular from top to bottom) 
as a function of redshift, magnitude and resolution (from left to right). 


GOODS-S. 

4.3. Secondary classes - multi-component objects 

Also important are the galaxies that have a composition of 
different structures. We use 2 parameters to identify these ob¬ 
jects, which are simply the value of the maximum frequency 
{fmax) and the difference between the largest and the second 
largest frequency (A/i_/ 2 ). A galaxy with a fairly high f^nax 
value and a low Afi_f 2 should be a galaxy with two clear 
components. For the purpose of this test, we define these 
galaxies as the ones that have fmax > 0-5 and a A/i _/2 < 0.5. 

We then look for the three different possible combi¬ 
nations of primary and secondary classes (Disk-fSpheroid 
(DS), Disk-firregular (DI), Spheroid-firregular (SI). Figure pi] 
shows the relation between the 3 defined 2-component classes 


from the visual and the automatic classifications. The agree¬ 
ment is again close to 95% for DSs and DIs which means that 
the algorithm is not only able to identify the primary class but 
also the secondary one, whenever the galaxy has two clear 
morphological components. The agreement for the SI class 
is poor. However this is a very marginal class since very few 
objects have both a dominant bulge with an irregular struc¬ 
ture. They are usually associated to bulges with some kind 
of structure in the surroundings in the automatic classification 

(fig-dU- 

4.4. Uncertain objects - Limitations 

A galaxy with none of the 5 associated frequencies large 
enough (none of the available fiags was clearly selected by 
the majority of the classifiers) should correspond to an ob- 
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Fig. 10.— Relation between the visual and automatic dominant morphological classes for well defined objects. The sizes of the symbols are proportional to 
the number of objects. The level of agreement is > 95%. 


ject which has an uncertain morphology. The identification of 
these objects can help in understanding the limits of the mor¬ 
phological classification. 

Figurep^shows how the fraction of uncertain objects changes 
with magnitude, redshift and stellar mass for different f^ax 
thresholds, starting at fmax <0.4 and finishing at f^ax <0.7, 
i.e. objects for which their maximum frequency is less than 
0.4 and 0.7 respectively. 

The number of objects with f^ax lower than 0.4-0.5 is very 
small (< 5%) for both the visual and automatic classifica¬ 
tions which reflects the fact that the magnitude limit imposed 
(H < 24.5) allows to identify a main morphology in most of 
the cases. 

When the threshold is increased, the expected trends are ob¬ 
served, i.e. the number of defined uncertain objects increases 
with magnitude, redshift and is also higher for lower stellar 
masses. Interestingly, the trends are very similar for the visual 
and automatic morphologies. The automated classification is 
therefore reproducing the same uncertainties than the human 
eye encounters when classifying a galaxy. 

In the bottom row of figure we also show the median 


value of a, the level of agreement between classifiers, in bins 
of magnitude, redshift and stellar mass. The level of agree¬ 
ment of the classification decreases for faint, distant and low 
mass objects as expected. The strongest correlation is how¬ 
ever with magnitude indicating that that the main limitation to 
properly classify a galaxy is the signal-to-noise-ratio. Notice 
also that the median level of agreement is always >0.4 which 
according to figure [TT] corresponds to an accuracy > 80% for 
all objects. 

5. ACCURACY IN ALL CANDELS FIELDS 

All previous results are based on GOODS-S where visual 
classifications are available for training and testing. The main 
purpose of the present work is to extend the classification to 
all CANDELS fields where visual inspection is not yet avail¬ 
able. It is therefore important to give an estimate of how the 
algorithm is behaving in these blank fields. 

5.1. Field-to-field homogeneity 

One quick sanity check consists in making sure that there 
are no significant statistical differences among the morpho- 
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Fig. 11.— Classification accuracy as a function of the level of agreement between classifiers (a). The red line shows the relation when a is computed using 
the visual classification. The blue line indicates the same relation but a computed form the automated classification. The horizontal line indicates the average 
accuracy. 




Fig. 12.— Probability distributions of being early-type (left panel) and irregular (right panel) estimated by galSVM (see|Huertas-Company et al.|2014) for 
three dominant classes in the CANDELS visual classification as labelled. Dominant disks cannot reliably separated from dominant irregulars using this approach. 


logical distributions in the different fields. We do expect in¬ 
deed that all fields should have similar fractions of all mor¬ 
phologies within cosmic variance since they have similar 
depths and are selected randomly. It is true that the CAN- 
DELS surveys has some deep and wide areas which are ob¬ 
served at different depths. However, we are imposing in this 
work a magnitude cut much brighter than the magnitude limit 
of the survey so our classification should not affected by these 
different depths. Therefore, eventual significant differences 
could be a sign of biases in the derived morphological clas¬ 
sifications in a given field and an eventual signature of over¬ 
fitting problems. 

Figureshows the cumulative distribution functions (CDFs) 
of the dmerent frequencies (fsph, fdisk^ firr) in the 5 fields. We 


do not observe significant differences from field to field in 
the distribution of frequencies, suggesting that the algorithm 
is behaving in a similar way independently of the field. Re¬ 
call however that the machine tends to smooth the distribution 
compared to the visual one. In other words, it removes any 
gap or abrupt changes. Gaps are instead present in the visual 
classifications given the reduced number of classifiers per ob¬ 
ject (even after noise addition). 

5.2. UDS visual classification 

During the production of the automated classification pre¬ 
sented in this work, the visual classification for the UDS field 
has been finalized using the same classification scheme. Com¬ 
paring the resulting parameters with the automated results on 
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Fig. 13.— Examples stamps of the 5 dominant morphological classes in the COSMOS/CANDELS field. Erom top to bottom we show dominant spheroids, 
dominant disks, dominant irregulars, dominant point/sources-compact and dominant unclassifiable. The selection of these stamps is done fully randomly. Recall 
that COSMOS galaxies have not been used for training the algorithm, therefore they are completely new for the best model. The size of the stamps is 3.8 x 3.8 ’. 


this field is therefore a fully independent test of the morpholo¬ 
gies released in this work and a definitive test to rule out any 
over-fitting issues. 

There are unfortunately important differences between the vi¬ 
sual classifications in GOOD-S and UDS that need to be taken 
into account before performing a fair comparison. 

As a matter of fact, as shown in figure the distribution 
of the morphological parameters for the ConvNets classifica¬ 
tion is similar in all fields and mimics the distribution of the 
visual GOODS-S classification as expected. The problem is 
that while in GOODS-S the number of classifiers per galaxy 
is roughly homogeneously distributed between 3-5 with some 
galaxies classified by ~ 50 people, in UDS ^ 90% of the 
galaxies are only classified by 3 people and the remaining 
5% by 4 (see fig p^. This difference results in a different 
distribution of the visual morphological frequencies between 
UDS and GOODS-S (i.e. frequencies in UDS only have 4 
possible values for most of the galaxies) which persists even 
after addition of random noise for smoothing (fig.[^. Since 
the automated classification necessarily follows the distribu¬ 
tion for which it was trained, the comparison with UDS visual 
classifications will have a larger scatter which is not due to a 
failure in the algorithm but to a difference in the inputs. 

In order to estimate how much this will affect the compari¬ 
son in the UDS, we recomputed the GOOD-S frequencies by 
randomly taking only 3 classifiers per galaxy (i.e. ignoring 
the classifications whenever there are more than 3 classifiers) 
and compared with the automated classification as done in fig- 


The results of such an exercise are shown in figure In the 
left column we plot the comparison when all classifiers are 
taken into account (as in fig. and in the middle column the 
same comparison but only with 3 classifiers. There is a clear 
increase of the scatter and the bias which is only caused by 
the change of the distribution of the input values (the output 
is exactly the same). Interestingly, the trends are very similar 
to what is observed in the comparison with the UDS (right 
column) which suggests that the worsening of the results in 
the UDS is not due to a bad behavior of the algorithm on this 
field, but simply to a different distribution of the inputs. 

The latter effect can also be understood if we consider that, 
at first level, the process of having n classifiers visually select¬ 
ing between two labels (binary classification) follows a bino¬ 
mial distribution. Let us assume for example that an image 
has an intrinsic probability p to be classified as a spheroid. 
It follows that the variance of the distribution of the number 
of people labeling it as "yes" from a total of n is np{\ — p). 
Therefore the deviation of the visually classified fractions is 
a/ p{l — p)ln. The deviation of the fractions will depend on 
the intrinsic probability p and the number of annotations. The 
less amount of annotators we have, the higher the variance on 
the fractions, i.e. less reliable the probabilities of each class 
will become (compared to the intrinsic one). So training a 
machine with a noisier training set will also result in a noisier 
classification. 

This issue emphasizes one main advantage of the automated 
classifications with respect to the visual when a small number 
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Fig. 14.— Relation between the visual and automatic 2 component classes. The level of agreement is > 95%. 


of classifiers is involved. Namely the results are by definition 
homogeneous for all datasets. The fact that the UDS and the 
GOODS-S with only 3 classifiers look very similar also sug¬ 
gests that the algorithm has a similar accuracy in both fields, 
confirming that the classification is not severely affected by 
over-fitting. 

6. CATALOG 

The paper is accompanied by the public release of the 
morphology of all galaxies in the CANDELS fields brighter 
than Hpiem = 24.5. In addition to the 5 morphological 
parameters, we also provide in the catalog a 2 measure¬ 
ments of the quality of the classification discussed in the 
text (a and Af^-f^) as well as the dominant class and the 
maximum frequency fmax- Table shows the first few 
lines of the catalog. The catalog is released through the 
Rainbow database: http://rainbowx.fis.ucm.es/ 
Rainbow_navigator_public/ 

The classification provided is by definition continuous, 
since each galaxy has 5 parameters spanning from 0 to 1. 


The use of these parameters to actually define morpholog¬ 
ical classes strongly depends on the science purposes and 
the galaxy properties one would like to highlight. Establish¬ 
ing thresholds in the different fractions necessarily implies a 
trade-off between pure and complete samples. 

Eor illustration purposes on how to use the catalog, we pro¬ 
pose here one possible classification in 5 different morpholog¬ 
ical classes based on establishing thresholds in the different 
frequencies (see Huertas-Company et al. 2015a): 

• pure bulges [SPH]: fsph >2/3 AND f^tsk <2/3 AND 

firr < 1/10 

• pure disks [DISK]: fsph <2/3 AND pisk >2/3 AND 

firr < 1/10 

• disk+sph [DISKSPH]: fsph >2/3 AND >2/3 
AND firr < 1/10 

• irregular disks [DISKIRR]: fdisk >2/3 AND fsph < 
2/3 AND firr > 1/10 

• irregulars/mergers[IRR]: fdisk <2/3 AND fsph <2/3 
AND/;vr> 1/10 
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Fig. 15.— Examples stamps of objects with two main morphological classes in the COSMOS/CANDELS field. From top to bottom we show spheroids+disks, 
disks+irregular, spheroids+irregular. The selection of these stamps is done fully randomly. Recall that COSMOS galaxies have not been used for training the 
algorithm, therefore they are completely new for the best model. The size of the stamps is 3.8 x 3.8 . 


ID 

IAU NAME 

RA 

DEC 

Eilter 

Jspheroid 

fdisk 

firr 

fps 

func 

fmax 

<1 

DOM CLASS 

a 

1 

HCPG J142112.26+5303004.5 

215.3011017 

53.051239 

fl60 

0.1 

0.1 

0.17 

0.0 

0.72 

0.72 

0.54 

4 

0.38 

1000 

HCPG J142051.15+5300016.8 

215.2131348 

53.0046539 

fl60 

0.73 

0.12 

0.08 

0.37 

0.0 

0.73 

0.36 

0 

0.34 

10001 

HCPG J141955.98+5253037.2 

214.9832611 

52.8936768 

fl60 

0.11 

1.0 

0.01 

0.0 

0.0 

1.0 

0.89 

1 

0.82 

10002 

HCPG J142044.89+5301059.4 

215.187027 

53.0331574 

fl60 

0.57 

1.0 

0.0 

0.01 

0.0 

1.0 

0.43 

1 

0.78 

10003 

HCPG J142013.52+5256044.1 

215.0563202 

52.9455872 

fl60 

0.25 

0.88 

0.22 

0.01 

0.03 

0.88 

0.63 

1 

0.4 

10004 

HCPG J141924.91+5248004.0 

214.8538055 

52.8011017 

fl60 

0.84 

0.16 

0.06 

0.24 

0.01 

0.84 

0.59 

0 

0.39 

10005 

HCPG J142025.18+5258045.7 

215.1049042 

52.9793701 

fl60 

0.34 

0.92 

0.16 

0.0 

0.0 

0.92 

0.58 

1 

0.53 

10010 

HCPG J141906.89+5244043.3 

214.778717 

52.7453613 

fl60 

0.34 

1.0 

0.09 

0.0 

0.0 

1.0 

0.66 

1 

0.64 

10015 

HCPG J141859.26+5243018.4 

214.746933 

52.7217865 

fl60 

0.19 

0.97 

0.18 

0.0 

0.0 

0.97 

0.78 

1 

0.59 

10017 

HCPG J142009.87+5256005.7 

215.0411224 

52.934906 

fl60 

0.33 

0.95 

0.09 

0.02 

0.0 

0.95 

0.62 

1 

0.55 

10018 

HCPG J141927.56+5248031.8 

214.8648376 

52.8088379 

fl60 

0.0 

0.95 

0.14 

0.0 

0.0 

0.95 

0.81 

1 

0.78 

10019 

HCPG J141952.59+5253001.8 

214.9691162 

52.8838196 

fl60 

0.05 

0.16 

0.98 

0.0 

0.0 

0.98 

0.82 

2 

0.71 

10020 

HCPG J142037.78+5301000.3 

215.1574097 

53.0167541 

fl60 

0.34 

0.62 

0.42 

0.1 

0.0 

0.62 

0.2 

1 

0.22 

10024 

HCPG J141917.09+5246040.1 

214.8211975 

52.7778015 

fl60 

0.84 

1.0 

0.0 

0.01 

0.0 

1.0 

0.16 

1 

0.89 

10026 

HCPG J141922.45+5247042.5 

214.8435364 

52.7951355 

fl60 

0.47 

0.94 

0.13 

0.0 

0.01 

0.94 

0.47 

1 

0.55 

10027 

HCPG J141938.69+5250035.2 

214.9111938 

52.8431091 

fl60 

0.11 

0.9 

0.16 

0.04 

0.05 

0.9 

0.74 

1 

0.44 

10029 

HCPG J142055.91+5304013.0 

215.2329407 

53.070282 

fl60 

0.78 

0.13 

0.02 

0.34 

0.01 

0.78 

0.44 

0 

0.42 

1003 

HCPG J142011.48+5253015.9 

215.0478363 

52.8877411 

fl60 

0.58 

0.85 

0.15 

0.04 

0.01 

0.85 

0.27 

1 

0.45 

10032 

HCPG J142027.07+5259005.7 

215.112793 

52.9849091 

fl60 

0.18 

0.66 

0.56 

0.0 

0.0 

0.66 

0.1 

1 

0.44 

10035 

HCPG J141938.21+5250030.9 

214.9091949 

52.841919 

fl60 

0.22 

0.91 

0.17 

0.04 

0.0 

0.91 

0.69 

1 

0.48 

10036 

HCPG J141939.83+5250048.2 

214.9159393 

52.8467102 

fl60 

0.44 

0.21 

0.02 

0.44 

0.25 

0.44 

0.0 

0 

0.08 


TABLE 3 

Sample of the morphological catalog released with the paper. In addition to the 5 main morphological indicators, we provide 

FOR EACH GALAXY TWO MEASUREMENTS OF THE LEVEL AGREEMENT BETWEEN CLASSIFIERS (A, LINKED TO THE ENTROPY - SEE TEXT FOR DETAILS) 
AND A/, THE DIFFERENCE BETWEEN THE TWO LARGEST FREQUENCIES. DOM_CLASS GIVES THE DOMINANT CLASS (CLASS WHICH HAS THE 
MAXIMUM FREQUENCY), BEING 0, SPHEROID, 1, DISK, 2, IRREGULAR, 3, POINT-SOURCE AND 4 UNCLASSIFIABLE. THE CATALOG CAN BE 
DOWNLOADED FORM THE RAINBOW DATABASE: HTTP : //rAINBOWX . FIS . UCM. E S/RA I NB 0 W_N AV I G AT 0 R_P UB L I C /| 


The thresholds are obviously arbitrary but have been cal¬ 
ibrated through visual inspection to make sure that they re¬ 
sult in different morphological classes. The SPH class con¬ 
tains galaxies fully dominated by the bulge component with 
little or no disk at all. The DISK class is made of galaxies in 
which the disk component dominates over the bulge. Between 
both classes, lies the DISKSPH class in which we put galax¬ 
ies with no clear dominant component. Then we distinguish 
2 types of irregulars: DISKIRR, i.e. disk dominated galaxies 
with some asymmetric features and IRR, which are irregular 
galaxies with no clear dominant disk component (including 
mergers). 


Some random example stamps in the COSMOS field are 
shown in figure ^ Also for illustration purposes, we show in 
figures ?? to ^ tEe Sersic index distributions and UVJ planes 
for galaxies with M^/Mq > 10^^ split in different morpholog¬ 
ical types and for several redshift bins. The expected trends 
are observed in both fi gures and are also very similar to the 
distributions shown by Kartaltepe et al.|[2014| on which our 
classification is based. 

We observe indeed that the different morphological types 
have very different Sersic index distributions. Objects with 
a clear bulge component according to their visual inspection 
(spheroids and bulge-fdisk systems), tend to have larger Ser- 
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Fig. 16.— Fraction of uncertain objects defined for different fmax thresholds as labelled in the automatic (top) and visual (middle) classifications. The fraction 
of uncertain objects increase for fainter objects, high redshifts and low masses. Similar trends are recovered in both classifications. The bottom line shows the 
relation between the level of agreement a (see text) and magnitude (left), redshift (middle) and stellar mass (right). 


sic indices and also tend to be located in the passive zone of 
the UVJ plane. Disk-dominated objects peak at ^ ^ 1 and are 
star-forming based on their locus on the UVJ plane. 

One interesting class is the bulge-fspheroid class (i.e. objects 
with no clear dominant disk or spheroidal component) since 
they do not have a clear locus in the UVJ diagram. Roughly 
half of them are passive and the other half are star-forming. 
Any selection based on star-formation activity will therefore 
split this population in two groups. Having a pure morpho¬ 
logical classification enables to isolate objects that are diffi¬ 
cult to identify with colors and/or single profile fitting. It is 
also interesting to notice that the large morphological catalog 
put together in this paper, allows to study objects which de¬ 
viate from the general trends (i.e. passive disks, star-forming 


bulges) with reasonable statistics (see fig.[^. 

7. SUMMARY AND CONCLUSIONS 

This work presents a visual-like morphological classifica¬ 
tion of ^ 50.000 galaxies {H <2A.5)'m 5 CANDELS fields 
(GOODS-S, GOODS-N, UDS, COSMOS and EGS) in the H 
band, which probes optical rest-frame morphologies in the 
redshift range 1 < z < 3. The sample is ^ 80% complete 
down to log{M^/MQ) ^ 10. 

Morphologies are estimated with a 5-layer Convolutional 
Neural Network (ConvNet) followed by 2 layers of fully 
connected perceptrons trained to reproduce the visual mor¬ 
phologies of ^ 8000 gala xies in GQQDS-S publi shed by the 
CANDELS collaboration ( [Kartaltepe et al.|20'T^ . ConvNets 
are a particular family of neural networks that take advantage 
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Fig. 17.— Cumulative distribution functions (CDFs) of fsph (left), f^isk (middle) and firr (right) derived in all 5 CANDELS fields as labelled. We also show in 
black the CDF of visual classifications in CDS (after addition of random noise). There are no major differences between the fields and the distributions follow 
the distributions of the visual classification. 





Fig. 18.— Left: Number of visual classifiers per galaxy in the UDS and the GOODS-S fields. 90% of the galaxies are classified by only 3 people in UDS. 
Middle: CDFs of the main morphological parameters in UDS and GOOD-S. Right: ^Same CDr^s, after, adciition of ga’assian noise. , . ^ 

or the image stationarity to inimic the way the numan brain http : 77rainbowx .Tis . ucm. es7Rainbow_ 


cells behave to recognize specific patterns. 

Following the approach in CANDELS, we associate to 
each galaxy 5 real numbers, f spheroid, fdisk, firr, fps and func, 
corresponding respectively to the frequency at which expert 
classifiers fiagged a galaxy as having a bulge, having a disk, 
presenting an irregularity, being compact or point-source and 
being unclassifiable. Galaxy images are interpolated to a 
fixed size, rotated and randomly perturbed before feeding the 
network to (i) avoid over-fitting and (ii) reach a comparable 
ratio of background vs. galaxy pixels in all images. 

ConvNets are able to predict the votes of expert classifiers 
with a < 10% bias and a ^ 10% scatter. This makes the 
classification almost equivalent to a visual based one. The 
training took 10 days on a GPU and the classification is 
performed at a rate of 1000 galaxies/hour. As opposed to 
generalized CAS methods (i.e. galSVM), ConvNets are able 
to identify without ambiguity (< 1% miss-classifications) 
objects that are not galaxies (high func values), distinguish 
irregulars from disks at all redshifts and spheroids from disks. 

The catalog of ^ 50.000 galaxies is released with 
the present paper through the Rainbow database: 


navigator_public7 The catalog actually increases 
by a factor of 5 the existing (public) morphologies in the 
CANDELS fields and is intended to be used for many 
diverse scientific applications (i.e. evolution of merger 
rates, morphological evolution from z ^ 3, morphology- 
density/environment relation, morphology-AGN connection 
etc...). 

Future efforts will be focused on optimizing deep-learning 
based approaches like the one presented here for EU¬ 
CLID AVFIRST/LS ST like data, analyzing deeper data such 
as the Hubble Frontier Fields as well as providing more de¬ 
tailed morphological descriptors in CANDELS (i.e tidal fea¬ 
tures etc...). 
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Fig. 19.— Correlation between the fractions of classifiers voting for a given feature. Left: GOOD-S when all classifiers are considered. Middle: GOOD-S 
with only 3 classifiers. Right: UDS where 90% of galaxies are classified by 3 people. The trends observed in the middle and right columns for all parameters are 
similar suggesting that the worsening of the results observed in the UDS are due to a difference in the input catalog. 
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Fig. 20.— Examples stamps of the 5 morphological classes defined for illustration in the COSMOS/CANDELS field. From top to bottom we show spheroids, 
disks, disk+spheroids. irregular disks and irregulars. The selection of these galaxies is done fully randomly. Recall that COSMOS galaxies have not been used 
for training the algorithm, therefore they are completely new for the best model. The size of the stamps is 3.8 x 3.8 . 
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Fig. 21.— Sersic index distribution for different morphological types as labeled. We show galaxies with M* /Mq > 10^^. Each panel shows a different redshift 
bin. The expected trends are observed, i.e. bulge dominated systems tend to have high Sersic indices while more disky galaxies peak at lower values. 
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Fig. 22.— UVJ plane for M^/Mq > 10^^ galaxies in di fferent redshift bins as labeled. Red dots show spheroids and gray points show all other galaxies. The 
red lines show the location of passive galaxies according to |Whitaker et al.|{20121 
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Fig. 23.— UVJ plane for M^/Mq > 10^^ galaxies in different redshift bin s as labeled. Brown d ots show disk+spheroids systems and gray points show all 
other galaxies. The red lines show the location of passive galaxies according to [Whitaker et al.|p012) 
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Fig. 24.— UVJ plane for M^/Mq > 10^^ galaxies i n different redshift bin s as labeled. Blue dots show disks and gray points show all other galaxies. The red 
lines show the location of passive galaxies according to [Whitaker et al.|p0121 
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Fig. 25.— UVJ plane for M^/Mq > 10^^ galaxies in different redshift bins as labeled. Green and violet dot s show irregular and d isk/irregular galaxies 
respectively and gray points show all other galaxies. The red lines show the location of passive galaxies according to [Whitaker et al.|p012) 



Fig. 26.— Example stamps of star-forming spheroids (top row) and passive disks (bottom row). For each galaxy we show the sersic index and the redshift. 
























































